153 research outputs found
CASAPose: Class-Adaptive and Semantic-Aware Multi-Object Pose Estimation
Applications in the field of augmented reality or robotics often require
joint localisation and 6D pose estimation of multiple objects. However, most
algorithms need one network per object class to be trained in order to provide
the best results. Analysing all visible objects demands multiple inferences,
which is memory and time-consuming. We present a new single-stage architecture
called CASAPose that determines 2D-3D correspondences for pose estimation of
multiple different objects in RGB images in one pass. It is fast and memory
efficient, and achieves high accuracy for multiple objects by exploiting the
output of a semantic segmentation decoder as control input to a keypoint
recognition decoder via local class-adaptive normalisation. Our new
differentiable regression of keypoint locations significantly contributes to a
faster closing of the domain gap between real test and synthetic training data.
We apply segmentation-aware convolutions and upsampling operations to increase
the focus inside the object mask and to reduce mutual interference of occluding
objects. For each inserted object, the network grows by only one output
segmentation map and a negligible number of parameters. We outperform
state-of-the-art approaches in challenging multi-object scenes with
inter-object occlusion and synthetic training.Comment: BMVC 2022, camera-ready version (this submission includes the paper
and supplementary material
BTSeg: Barlow Twins Regularization for Domain Adaptation in Semantic Segmentation
Semantic image segmentation is a critical component in many computer vision
systems, such as autonomous driving. In such applications, adverse conditions
(heavy rain, night time, snow, extreme lighting) on the one hand pose specific
challenges, yet are typically underrepresented in the available datasets.
Generating more training data is cumbersome and expensive, and the process
itself is error-prone due to the inherent aleatoric uncertainty. To address
this challenging problem, we propose BTSeg, which exploits image-level
correspondences as weak supervision signal to learn a segmentation model that
is agnostic to adverse conditions. To this end, our approach uses the Barlow
twins loss from the field of unsupervised learning and treats images taken at
the same location but under different adverse conditions as "augmentations" of
the same unknown underlying base image. This allows the training of a
segmentation model that is robust to appearance changes introduced by different
adverse conditions. We evaluate our approach on ACDC and the new challenging
ACG benchmark to demonstrate its robustness and generalization capabilities.
Our approach performs favorably when compared to the current state-of-the-art
methods, while also being simpler to implement and train. The code will be
released upon acceptance
Improved Hand-Tracking Framework with a Recovery Mechanism
Abstract−Hand-tracking is fundamental to translating sign language to a spoken language. Accurate and reliable sign language translation depends on effective and accurate hand-tracking. This paper proposes an improved hand-tracking framework that includes a tracking recovery algorithm optimising a previous framework to better handle occlusion. It integrates the tracking recovery algorithm to improve the discrimination between hands and the tracking of hands. The framework was evaluated on 30 South African Sign Language phrases that use: a single hand; both hands without occlusion; and both hands with occlusion. Ten individuals in constrained and unconstrained environments performed the gestures. Overall, the proposed framework achieved an average success rate of 91.8% compared to an average success rate of 81.1% using the previous framework. The results show an improved tracking accuracy across all signs in constrained and unconstrained environments
Sequential Quantum Teleportation of Optical Coherent States
We demonstrate a sequence of two quantum teleportations of optical coherent
states, combining two high-fidelity teleporters for continuous variables. In
our experiment, the individual teleportation fidelities are evaluated as F_1 =
0.70 \pm 0.02 and F_2 = 0.75 \pm 0.02, while the fidelity between the input and
the sequentially teleported states is determined as F^{(2)} = 0.57 \pm 0.02.
This still exceeds the optimal fidelity of one half for classical teleportation
of arbitrary coherent states and almost attains the value of the first
(unsequential) quantum teleportation experiment with optical coherent states.Comment: 5page, 4figure
Hyperspectral Demosaicing of Snapshot Camera Images Using Deep Learning
Spectral imaging technologies have rapidly evolved during the past decades.
The recent development of single-camera-one-shot techniques for hyperspectral
imaging allows multiple spectral bands to be captured simultaneously (3x3, 4x4
or 5x5 mosaic), opening up a wide range of applications. Examples include
intraoperative imaging, agricultural field inspection and food quality
assessment. To capture images across a wide spectrum range, i.e. to achieve
high spectral resolution, the sensor design sacrifices spatial resolution. With
increasing mosaic size, this effect becomes increasingly detrimental.
Furthermore, demosaicing is challenging. Without incorporating edge, shape, and
object information during interpolation, chromatic artifacts are likely to
appear in the obtained images. Recent approaches use neural networks for
demosaicing, enabling direct information extraction from image data. However,
obtaining training data for these approaches poses a challenge as well. This
work proposes a parallel neural network based demosaicing procedure trained on
a new ground truth dataset captured in a controlled environment by a
hyperspectral snapshot camera with a 4x4 mosaic pattern. The dataset is a
combination of real captured scenes with images from publicly available data
adapted to the 4x4 mosaic pattern. To obtain real world ground-truth data, we
performed multiple camera captures with 1-pixel shifts in order to compose the
entire data cube. Experiments show that the proposed network outperforms
state-of-art networks.Comment: German Conference on Pattern Recognition (GCPR) 202
- …